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ABSTRACT 

This  paper  proposes  a  novel  framework  for  commendably  ascertaining  frequently  accessed  data  items  over  a 
wireless  XML  broadcasting  scheme.  A  proficient  XML  dissemination  scheme  is  used  for  supporting  twig  pattern  queries  in 
the  wireless  environment.  Twig  pattern  queries  that  contain  intricate  conditions  are  quite  common  and  critical  in  XML 
query  processing.  The  mobile  client  can  retrieve  the  required  data  satisfying  the  given  twig  pattern  by  performing  bit-wise 
operations  on  the  Lineage  Codes  in  the  relevant  G-nodes.  Counter -based  algorithm  tracks  a  subset  of  items  from  the  input, 
and  monitor  counts  associated  with  these  items.  It  decides  for  each  new  arrival  whether  to  store  this  item  as  frequent  item 
or  not.  The  proposed  framework  is  reliable,  efficient  and  user-friendly. 

KEYWORDS:  Attribute  Summarization,  Frequent  Items,  Twig  Pattern  Query,  Wireless  Broadcasting 
INTRODUCTION 

Mobile  devices  are  the  building  blocks  of  Wireless  environment.  Recent  advances  in  communication  technology 
have  greatly  increased  the  functionality  of  mobile  information  services.  An  important  application  is  to  provide  various 
types  of  real-time  information  such  as  stock  quotes,  weather  conditions  and  traffic  information,  to  clients. 
Data  broadcasting  (Acharya  et  al.  1995;  Kaushik  2004)  is  one  among  the  efficient  data  dissemination  strategies  which  is 
very  cost  effective  in  disseminating  a  substantial  amount  of  information  to  a  large  number  of  mobile  clients. 

Wireless  broadcasting  is  a  successful  information  diffusion  move  towards  the  wireless  environment  for  the  reason 
that:  1)  the  server  can  support  an  enormous  number  of  mobile  clients  without  extra  costs  (i.e.,  scalability),  2)  the  broadcast 
channel  is  shared  by  many  clients  (i.e.,  the  effectual  utilization  of  bandwidth),  and  3)  the  mobile  clients  can  receive  data 
without  sending  request  messages  that  consumes  huge  energy.  In  today's  context,  there  is  a  need  to  consider  energy 
conservation  of  mobile  clients  and  also  reduced  query  processing  time  to  provide  efficient  and  fast  response  to  the  users 
(i.e.,  latency-efficiency).  To  measure  the  energy-efficiency  and  latency-efficiency  in  wireless  broadcasting 
(Chung  et  al.  2010;  Imielinskil997),  the  tuning  time  and  access  time  are  used,  respectively.  The  frequent  items 
(Cormodeet  al.  2010)  problem  is  to  process  a  stream  of  items  and  find  all  items  that  occurs  more  than  a  given  fraction  of 
the  time.  Typically,  this  is  formalized  as  finding  all  items  whose  frequency  exceeds  a  specified  fraction  of  the  total  number 
of  items.  The  items  can  represent  packets  on  the  Internet,  queries  made  by  the  client  and  the  weights  are  the  size  of  the 
packets.  If  the  items  represent  queries  then  the  frequent  items  are  now  the  (currently)  popular  terms.  It  is  important  to  find 
algorithms  which  are  capable  of  processing  each  new  update  very  quickly,  without  blocking.  It  also  helps  if  the  working 
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space  of  the  algorithm  is  very  small.  Obtaining  efficient  and  scalable  solutions  to  the  frequent  items  problem  is  also 
important  since  many  streaming  applications  need  to  find  frequent  items  as  a  subroutine  of  another,  more  complex 
computation. 

This  paper  proposes  a  novel  framework  for  finding  frequent  items  in  wireless  XML  streaming  scheme  that 
supports  queries  in  the  wireless  mobile  environment  by  addressing  1)  a  streaming  unit  called  G-node  which  eliminates 
structural  overheads  of  XML  documents,  and  enables  mobile  clients  to  skip  downloading  of  irrelevant  data  during  query 
processing,  2)  algorithms  for  generating  wireless  XML  stream  with  G-nodes  and  query  processing  over  the  wireless  XML 
stream,  and  3)  also  algorithms  for  identifying  frequent  queries  on  client  side  from  query  directory  that  stores  the  queries 
and  their  responses. 

This  paper  is  organized  in  different  sections  addressing  the  problem  statement,  system  architecture  (Figure  1), 
G-node  and  attribute  summarization  technique,  Lineage  Encoding  and  related  operators,  algorithms  for  Query  processing 
and  for  finding  frequent  queries,  and  performance  of  the  proposed  method  followed  by  conclusions  and  suggestion  for 
future  work.  The  efficacy  of  the  proposed  scheme  is  validated  through  experiment  results  and  comparison  against  the 
wireless  XML  streaming/conventional  XML  query  processing  methods  (Park  et  al.  2010;Wang  2005). 
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Figure  1:  System  Architecture 

BACKGROUND 

•  XML 

As  a  defacto  standard  for  information  representation  and  exchange  over  the  internet,  XML  has  been  used 
extensively  in  many  applications.  Most  of  Web  data  sources  are  represented  by  XML  specification.  As  a  result,  wide 
spectrum  of  users  needs  to  query  XML.  An  XML  document  contains  hierarchically  nested  elements,  and  usually  is 
modeled  as  a  document  tree.  Elements,  attributes,  and  texts  are  represented  by  nodes,  and  the  parent-child  relationships  are 
represented  by  edges  in  the  XML  tree.  Figure  2  shows  a  simple  XML  document  that  will  be  used  as  a  running  example  in 
the  paper. 

•  XPath 

In  this  paper  XPath  (Berglund2002)  is  used  as  a  query  language.  XPath  is  a  well-accepted  language  for 
addressing  parts  of  an  XML  document.  It  also  serves  a  stand-alone  query  language  for  XML.  Methods 
(Chen  2006;  Cormode  2010)  for  efficient  evaluation  of  XPath  queries  benefit  systems  for  more  powerful  languages 
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(e.g.,  XQuery)  which  incorporate  XPath.  An  XPath  query  consists  of  a  location  path  and  an  output  expression.  The  location 
path  is  a  sequence  of  location  steps  that  specify  the  path  from  the  document  root  to  a  desired  element.  The  output 
expression  specifies  the  portions  or  functions  of  a  matching  element  that  form  the  results.  Each  location  step  has  an  axis,  a 
node  test,  and  an  optional  predicate. 

Ql://Country[@id="f0_162"]/Province/Text() 

<nondifll> 

■  country  id-"fO_;U"  na»e-"France"  caaita]=*"fQ_lJlD"> 

■^province  ids"fCL_17**7"  na»e="Aqui  trine"  eountry=HfO_21J"  eapitaU"fO_?426"> 

<city  id-"fO_2426~  nw^Sordwu*"  country "f0_21  J"  prowinc«^"f0_r4a7~> 

{population  year="90">2Hm6</population> 

*/city> 
</provitice> 

^province  idj="fQ_17507"  naae="ile  de  France"  tountry="fD_2U"  capital="f0_151u"> 
<city  id="f 0,1510"  nanw  -"Paris"  province-'-fO_17i07"> 

■  nopulanon  year^"95">21  S747  !■ /popular  ion. 
<W*ted_*t  typ*="riv«r"/»' 

</city> 

tcity  id="fu_3M>8"  nane^'Bouloqne  Bi  1  lancourt"  country="f OJH"  provincez"fO_l?S07-> 
speculation  year=n90"il01743*/popt]laTiion> 

</cIty> 

</eeun.try> 

^country  id="f0^220"  nawe="'Gerinany"  capital="fC_1515"» 
*n aae>Ge  rman y < /name  > 

<provinre  id*  "fO.175  Jl"  names  "Bay  errT  country="f0_2?0"  cai>ita1»"f0 

*city  id="f0.2712"  wtVMKhn    count ry='-f0_2 2(T  province="fOU7S il"> 

■  pepularion  year-"y>"'  ■'LJA~b-'b- /  popular  ion  -- 

</city> 

<eity  id."fO_2747"  nia^'  Nurnberq"  counti-y^fa,^""  provincc^'f 0_175il"> 
.population  ytar="qi"-49!ft4S.  /population. 

</city> 

^province  id*"f0_175 J3"  na«="Berl  in"  count ry-"f(L2 20"  capital="fO_J515"» 

<city  id-'f0_1514-  naaNi-'Berlin-  country-nf0_220"  province.-fO_17533"> 
-.population  year=  9i  .-34 "2009-  /population 

t/city> 
*/provinw 
^     ^_  </eountry> 

Figure  2:  An  Example  of  XML  Document 

For  example,  the  location  path  of  the  query  //country[@id="f0_162"]/province/text()  is 
//country[@id="f0_162"]/provinc.  The  output  expression,  text(),  indicates  that  only  the  text  content  of  the  matching  name 
appears  in  the  result.  In  the  first  location  step,  //country [@id="f0_  162"],  //  is  the  closure  axis  denoting  descendant-or-self, 
country  is  the  node  test,  and  @id="f0_162"  is  the  predicate.  The  predicate  restricts  the  results  to  the  province  sub-elements 
of  country  that  have  an  id  sub-element  whose  content  has  value  as  f0_162. 

•  Twig  Pattern  Query 

A  twig  pattern  query  consists  of  two  or  more  path  expressions,  thus,  it  involves  element  selections  satisfying 
complex  patterns  in  tree-structured  XML  data.  The  twig  pattern  query  is  a  core  operation  in  XML  query  processing  and 
popularly  used  as  it  can  represent  complex  search  conditions  (Al-Khalifa  et  al.  2002;  Tatarinov  et  al.  2002).  For  example 
twig  pattern  query  "/mondial/country[name/text()="Germany"]/province/city"  is  to  find  cities  located  in  the  provinces  of  a 
country  that  has  a  child  element  "name"  whose  text  content  is  "Germany". 

•  Structure  Indexing 

Many  techniques  (Jun  Pyo  Park  et  al.  2013;  Wang  et  al.  2005)  using  a  structure  index  have  been  proposed  for 
efficient  XML  query  processing.  The  structure  indexing  directly  captures  the  structural  information  of  XML  documents 
and  is  used  for  XML  query  processing.  Conventional  wireless  XML  streaming  methods  (Park  et  al.  2005;  Park  et  al.  2006) 
using  a  structure  index  exhibit  good  performance  for  simple  path  query  processing  benefitting  from  the  size  reduction  but 
they  do  not  support  twig  pattern  queries. 

WIRELESS  XML  STREAM 

In  this  section,  stream  organization  for  XML  data  is  explained.  This  section(A)  presents  the  unit  structure  of  the 
stream  called  G-node.  The  next  section  (B)  explains  attribute  summarization.  Section  (C)  presents  XML  stream  generation 
algorithm. 
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•  G-Node 

The  wireless  XML  stream  consists  of  the  sequence  of  integrated  nodes,  called  as  G-node  (Park  2010). 
These  integrated  groups  are  useful  for  selective  access.  The  G-node  is  denoted  by 

Gp=  (GDp,  AVLp,  TLp) 

Definition  1:  The  G-node  is  a  data  structure  containing  information  of  all  the  elements  ep  whose  location  path  is 
p,  where  GDp  is  a  group  descriptor  of  Glx  AVLp  is  a  list  containing  all  attribute  values  of  ep  and  TLp  is  a  list  containing  all 
text  contents  ofe/r 

Group  Descriptor:  It  is  a  collection  of  indices  for  selective  access  of  a  wireless  XML  stream.  It  is  denoted  by 
GDp=  (Np,  LPp,  CIp,  AIp,  TIp,  STp,  ETp) 

Node  name  (Np)  is  the  tag  name  of  integrated  elements,  and  Location  path  (LPP)  is  an  XPath  expression  of 
integrated  elements  from  the  root  node  to  the  element  node  in  the  document  tree.  Child  Index  (CIp)  is  a  set  of  addresses 
that  point  to  the  starting  positions  of  child  G-nodes  in  the  wireless  XML  stream. 

Attribute  Index  (AIP)  contains  the  pairs  of  attribute  name  and  address  to  the  starting  position  of  the  values  of  the 
attribute  that  are  stored  contiguously  in  Attribute  Value  List.  Text  Index  (TIP)  is  an  address  pointing  to  the  starting  position 
of  Text  List.  Stream  Created  Time  (STP)  is  the  time  when  the  XML  stream  is  created  and  Expiry  Time  (ETP)  is  the  Validity 
period  i.e.,  is  the  time  to  live  period  for  the  broadcasted  data.  Figure  3  shows  an  example  of  the  Group  Descriptor. 


Node  name 
Lcuttcn  Pith 
Chid  lAffe*|0] 
Lineage  Ced«<Y,nj 

Attribute  mdextA'} 

fextirtdesifTi) 
in  fain  GinttrJied  Tinv^ifiT) 
Time  To  UwfTTll 


Proving? 

 ■  * 

/mondlal/courrtry/prcMnce 

 ► 

 ► 

£cltv.  cttyaddV  J 

 *■ 

r 

(Id,  id.dtjdf], 

(rum*-,  id.jpdiir), 
(country,  country,  adrir) 

Null 

 *■ 

 » 

17:46tii 

 *- 

Figure  3:  Group  Descriptor  (GD)  for  G-NodeProvince 


Attribute  Summarization 


Attribute  summarization  is  a  technique  (Park  2005)  used  to  reduce  the  size  of  wireless  XML  stream.  A  structural 
characteristic  exists  in  the  elements  of  XML  data.  It  is  that  elements  with  the  same  tag  name  and  location  path  often 
contain  the  attributes  of  the  same  name.  In  attribute  summarization,  the  redundant  attribute  names  of  elements  are 
eliminated,  thus,  the  size  of  XML  stream  can  be  significantly  reduced.  Figure  4  illustrates  summarized  Attribute  for  G- 

nodeCountry 
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Figure  4:  Attribute  Summarization  for  G-NodeCountry 

•     Wireless  Stream  Generation 

An  XML  document  to  be  distributed  is  retrieved  from  the  XML  repository  (XML  Data  Repository2012)  by  the 
server  and  uses  SAX  (Simple  API  for  XML)  (SAX  2004)  which  is  an  event-driven  parser  to  generate  Stream.  After  the 
streaming  of  XML  data,  streamed  XML  data  distributed  via  a  broadcasting  channel. 

Algorithm  1.  Wireless  XML  stream  generation 

Input:  A  well-formed  XML  document  D,  TTL  value 

Output:  Wireless  XML  Stream  XS 

01 :  ContentHandler.startDocument() 

02:  Path  Stack  =  NULL; 

03:  G-node  Queue  GQ  =  NULL; 

04:  Set  depth  as  1  and  nodeld  as  0; 

05:  ContentHandler.startElement() 

06:  Increase  depth  and  nodeld; 

07:  IF  (path  p  of  the  current  node  e  does  not  exist  in  PS) 
08:  Construct  a  new  G-node  G  with  Tag  name,  AI,  AVL; 
09:  Push  p  into  PS; 
10:  ELSE 

1 1 :  Get  a  G-node  G  of  path  p  from  GQ; 
12:  Add  attribute  values  to  AVL  of  G; 
13:  END  IF 

14:  Set  the  depth  th  parent  id  as  nodeld; 
15:  Add  (depth-l)th  nodeld  to  parent  list  of  G; 
16:  Add  nodeld  to  element  list  of  G; 
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17:  Enqueue  G  into  GQ; 

18:  ContentHandler.endElement() 

19:  Decrease  depth; 

20:  ContentHandler.characters() 

21:  Get  a  G-node  G  of  path  p  from  GQ; 

22:  Add  text  content  to  TL  of  G; 

23:  Enqueue  G-node  G  into  GQ; 

24:  ContentHandler.endDocument() 

25:  WHILE  (the  end  of  GQ  is  not  detected) 

26:  Get  top  entry  G-node  G,  its  child  G-node  Gc  in  GQ 

27:  Generate  AI  and  TI  of  G; 

28:  Compare  element  list  in  G  and  parent  list  in  Gc; 

29:  IF  (a  parent  element  exists)  THEN 

30:  Compute  no  of  elements  n  of  same  parent  in  Gc; 

31:  Add  a  1 -valued  bit  to  Lineagecode(V)  of  Gc; 

32:  Add  an  integer  n  to  Lineagecode(H); 

33:  ELSE  Add  a  0-valued  bit  to  Lineagecode(V); 

34:  END  IF 

35:  Set  CI  to  the  child  G-nodes  of  Gp; 
36:  END  WHILE 

37:  Flush  G-nodes  in  GQ  into  wireless  XML  Stream  XS; 
LINEAGE  CODING 

Lineage  Encoding  (Park  2010)  is  a  scheme  that  supports  queries  involving  predicates.  To  denote  parent-child 
relationship  between  XML  elements  in  two  G-nodes  two  kinds  of  lineage  codes,  i.e.  vertical  code  denoted  by  Lineage 
Code(V)  and  horizontal  code  denoted  by  Lineage  Code(H),  are  used. 

•  Lineage  Code  Definition.  Assume  that  a  G-node  C  is  a  child  of  a  G-node  P.  Let  Ec={ch  c2,  .  .  .  ,cmJ  and 
EP={pbp2,  ■  ■  ■  ,p„jbe  the  ordered  sets  of  elements  in  C  and  P,  respectively,  where  the  elements  are  ordered  in 
document  order.  We  suppose  that  g  :  Ec-^  EP  is  a  function  that  maps  an  element  in  Ec  to  its  parent  element  in  EP. 
Lineage  Code  of  the  G-node  C  is  defined  by  Lineage  Code(V,  Hj,  where  Lineage  Code  (V)  is  a  bit  string 
V  =  bj  b2  .  .  .  b„  where 
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1, 

if  the  i-th  element  in  EP;  pt  has  at  least  one  child  element 
in  Ec(i.e.,  there  exists  Cj£  Ec  such  that pt  =  g(cj)) 

< 

0, 

Otherwise. 

(1  <  i  <  n)  and  Lineage  Code  (H)  is  an  ordered  list  of  positive  integers  (nb  n2,  .  .  .  ,  n^,  where  n;, 
(1<  i  <  k)  is  the  number  of  elements  in  Ec  whose  parent  element  is  pj  in  EP  where  j=fv(i) 

JV(i)  represents  represents  the  position  of  a  bit  in  the  bit  string  V  which  is  the  ith  one  among  those  with  a  value  of 
1.  Figure  5  shows  an  example  of  Lineage  Codes  in  G-nodecollntry,  G-nodeplovince,  and  G-nodecity.  Lineage  Code(V)  of  a 
G-nodecity  is  the  elements  integrated  in  and  are  mapped  to  the  elements  in  it  parent.  Lineage  Code(H)  of  a  G-node  denotes 
the  number  of  child  elements  that  are  mapped  to  the  same  parent  element  in  document  order. 

■piwtrao'  rtnwrti  flt«gr»t*d  n  Q-Nodh„„ 

LCM  HI 

ic;v]  >un 

Figure  5:  Lineage  Codes  of  G-NodeCoimtry>  G-NodeProvince,  and  G-Nodeaty 
•     Selection  Function 

In  evaluating  a  given  query  with  predicates,  we  should  select  a  subset  of  elements  of  a  particular  type  satisfying 
the  given  predicates,  then  have  to  find  their  child  elements.  A  subset  of  the  elements  selected  in  a  G-node  can  be 
represented  by  a  bit  string,  called  a  selection  bit  string  (SB)  for  the  G-node,  where  l-value  bits  identify  the  selected 
elements.  In  this  section,  selection  functions  to  compute  a  selection  bit  string  for  the  G-node  is  explained. 

Select  Children  (C,  SBp):  Select  Children,  a  function  to  obtain  a  selection  bit  string  identifying  a  subset  of 
elements  in  a  particular  child  G-node.  Given  a  set  S  of  elements  in  a  G-node  P  identified  by  a  selection  bit  string  SBp,  the 
function  determines  a  subset  of  elements  in  a  child  G-node  C  that  are  children  of  the  elements  in  S.  A  selection  bit  string 
SBc  for  C  can  be  computed  based  on  the  Lineage  Code  of  C,  (V,  H),  using  Shrink  &  Mask  and  Unpack  operators  in  order. 
Shrink  &  Mask(V,  SBP)  first  shrinks  the  bit  string  V  by  removing  the  bits  with  a  value  of  0  from  V.  It  also  shrinks  the  bit 
string  SBp,  eliminating  the  bits  in  the  same  positions  as  those  removed  in  V.  Then,  it  calculates  Bitwise  AND  of  the  two  bit 
strings  shrunken  from  V  and  SBP.  For  example,  Shrink  &  Mask  (0 1 1 1 1 0,  1 1 00 1 1  )  =  1 1 1 1  &  1 00 1 = 1 00 1 . 

Unpack(V,  H)  returns  a  bit  string  V,  which  is  obtained  by  concatenating  bit  strings  S;  in  order,  where  Sj  is  a  bit 
string  of  the  length  nt  in  which  all  bits  are  equal  to  v;.  For  example,  Unpack(1001,  (2,  2,  2,  2)  =  1 100001 1. 

Select  Parents  (C,  SBC):  Select  Parents  is  a  function  to  identify  the  parent  elements  of  a  subset  of  elements 
selected  in  a  given  G-node.  Given  a  selection  bit  string  SBC  for  a  G-node  C,  Select  Parents(C,  SBC)  computes  a  selection 
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bit  string  SBP  for  the  parent  G-node  of  C.  The  selection  bit  string  SBP  can  be  derived  from  the  Lineage  Code  of  C,  (V,  H), 
using  Pack  and  Expand  &  Mask  operators  in  order.  Pack(SBc,  H)  shrinks  the  bit  string  SBC  using  H  to  compute  a  bit  string 
Vm  indicating  the  elements  in  the  parent  G-node  of  C  which  are  parents  of  the  elements  in  C  selected  by  SBC.  It  returns  a 
bit  string  Vr  =  rjr2 .  .  .  rk,  where  rj  =  Bitwise-OR  { vp+i,  vp+2,  .  .  .  ,  vp+ni),  where 


p  = 

0, 

ifi  =1 

Otherwise. 

For  example,  Pack(10010,  (3,1,1))  =  110. 

Expand  &  Mask  (V,  Vm)  expands  the  bit  string  Vm  to  the  length  of  V  by  placing  the  bits  in  W  to  the  same 
positions  of  the  bits  with  a  value  of  1  in  V  in  order  and  inserting  the  bits  with  a  value  of  0  in  the  remaining  positions. 
Then  it  masks  the  bit  string  V  by  the  expanded  bit  string.  For  example, 

Expand  &  Mask(01 1010,  1 10)  =  01 1010  &  01 1000  =  01 1000. 

Get  Selection  Bit  String  of  (N):  This  section  explains  the  function  to  select  elements  in  a  G-node  contained  in  the 
query  tree  of  a  given  twig  pattern  query.  The  answer  element  is  an  element  that  satisfies  given  predicate  conditions. 
The  recursive  function  performs  a  traversal  of  the  subtree  rooted  at  G-node  N  in  the  post  order,  depth-first  manner.  First,  it 
evaluates  a  selection  predicate  on  the  attributes  or  text  over  the  elements  in  the  G-node  to  obtain  a  selection  bit  string  for 
the  predicate.  For  each  child  node  C  of  an  internal  node  N,  the  algorithm  computes  its  selection  bit  string  SBC  recursively 
and  then  calculates  the  selection  bit  string  SBP  for  N  which  is  derived  from  SBC  and  Lineage  Code  of  C  using  the  Select 
Parents()  function.  Then,  the  result  selection  bit  string  for  N  is  produced  by  performing  bitwise  AND  operations  over  all 
the  selection  bit  strings  SBP  obtained  from  the  child  nodes  of  N. 

TWIG  PATTERN  QUERY  PROCESSING  (OVER  WIRELESS  XML  STREAM  IN  WIRELESS 
ENVIRONMENT) 

In  this  section,  the  paper  describes  how  a  mobile  client  can  retrieve  the  data  of  its  interests  to  process  the  query 
over  wireless  XML  stream.  Assuming  that  there  is  no  descendant  axis  in  the  user  query,  query  processing  for  a  Twig 
pattern  query  is  presented  in  Section  (A)  and  finding  frequent  data  item  is  described  in  Section  (B). 

•     Twig  Pattern  Query  Processing 

Algorithm  2  shows  the  Twig  pattern  query  processing  over  the  wireless  XML  stream.  Twig  pattern  query 
processing  involves  three  phases:  Tree  traversal  phase,  Subpaths  traversal  phase,  and  Main  path  traversal  phase.  The  main 
path  denotes  a  path  from  the  root  node  to  a  leaf  node  while  the  subpaths  denote  branch  paths  excluding  the  main  path  in  the 
query  tree. 

In  the  Tree  traversal  phase,  the  mobile  client  first  constructs  a  query  tree.  Then,  traversing  the  query  tree  it 
selectively  downloads  group  descriptors  of  the  relevant  G-nodes  into  the  nodes  in  the  query  tree.  Attribute  values  and  texts 
involved  in  the  given  predicates  are  also  downloaded  into  the  relevant  nodes.  In  the  Subpaths  traversal  phase,  the  mobile 
client  performs  a  post-order  depth-first  traversal  starting  from  the  highest  branching  node  in  the  query  tree  using  the 
GetSelectionBitStringOf()  function.  The  selection  bit  string  for  the  branching  node  is  calculated  from  all  the  subpaths  in  a 
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bottom-up  manner.  Finally,  the  Main  path  traversal  phase  propagates  the  selection  bit  string  on  the  branching  node  along 
the  main  path  using  the  SelectChildren()  function  .  Finally,  the  mobile  client  retrieves  the  set  of  answer  elements  in  the  leaf 
node  of  the  main  path  which  satisfies  the  given  twig  pattern  query. 

Algorithm  2.  Twig  Pattern  Query  Processing 

Input:  Wireless  XML  Stream  DS,  a  twig  pattern  query  Q 

Output:  Result  set  R  satisfying  Q 

01:  result  set  R  =  <t>;  //  initialization 

02:  Initialize  the  selection  bit  string  SB  as  1; 

03:  Initialize  Lineage  Code  of  root  G-node  as  (1,  (1)); 

04:  Initialize  nextNode  as  address  of  root  G-node  in  DS; 

05:  //  Tree  traversal  phase 

06:  Construct  a  query  tree  T  for  Q; 

07:  REPEAT  { 

08:  Tune  a  group  descriptor  GD  of  the  G-node  indicated 
by  nextNode; 

09:  IF  (current  node  CN  is  the  leaf  node  in  Tf>  THEN 
10:  Store  AVL  and  TL  the  node  in  T; 
11:  ELSE 

12:  IF  (CN  contains  predicate  conditions  Pf>  THEN 
13:  Tune  the  relevant  attribute  values/text  using  AI/TI; 
14:  Store  relevant  attribute  values/text  into  node  in  T; 
15:  END  IF 

16:  Assign  address  of  the  next  node  in  CI  to  nextNode; 
17:  END  IF 

18:  }  UNTIL  (all  nodes  in  T  are  completely  traversed) 

19:  //  subpaths  traversal  phase 

20:  Let  N  be  the  highest  branching  node  in  T; 

21:  SBN  =  Get  Selection  Bit  String  of(N); 

22:  //  Main  path  traversal  phase 
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23:  Let  MP  be  the  main  path  in  T  starting  from  N; 
24:  P  =  N; 
25i  SBp  =  SBfsji 
26:  REPEAT  { 

27:  Let  C  be  the  child  node  of  P  in  MP; 
28:  SBc  =  Select  Children(C,  SBp); 
29:P  =  C;SBP=SBC; 
30:  }  UNTIL  (C  is  the  leaf  node) 

31:  Select  R  of  elements  in  C  by  the  selection  bit  string  SBc; 
32:  Return  R; 

Figure  6  shows  query  processing  steps  for  a  twig  pattern  query, 
"/mondial/country[name/text()="France"]/province/city/population".  In  Tree  traversal  phase,  the  mobile  client  downloads 
group  descriptors  of  six  G-nodes  (i.e.,  G-nodemondiai,  G-nodecounUy,  G-nodenallre,  G-nodeprovinre,  G-nodecity  and 
G-nodepopiiiation).  In  Subpaths  traversal  phase,  the  mobile  client  computes  the  selection  bit  string  of  the  subpath  (10)  using 
the  Get  Selection  Bit  String  Of()  function.  In  Main  path  traversal  phase,  the  mobile  client  performs  the  SelectChildren() 
function  from  the  branching  node  (i.e.,  G-nodecountry)  to  the  leaf  node  (i.e.,  G-nodepopuiation). 
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Figure  6:  Example  of  Twig  Pattern  Query  Processing 

•     Finding  Frequent  Data  Items 

The  frequent  items  problem  is  one  of  the  most  heavily  studied  questions  in  data  streams  research.  Given  a 
sequence  of  items,  the  problem  is  simply  to  find  those  items  which  occur  most  frequently.  Typically,  this  is  formalized  as 
finding  all  items  whose  frequency  exceeds  a  specified  fraction  of  the  total  number  of  items. 

Algorithm  3  shows  Counter-based  Algorithm.  It  stores  k  (item,  counter)  pairs.  The  natural  generalization  of  the 
algorithm  is  to  compare  each  new  item  against  the  stored  items  T,  and  increment  the  corresponding  counter  if  it  is  among 
them.  Else,  if  there  is  some  counter  with  a  zero  count,  it  is  allocated  to  the  new  item,  and  the  counter  set  to  1 .  A  grouping 
argument  is  used  to  argue  that  any  item  which  occurs  more  than  nlk  times  must  be  stored  by  the  algorithm  when  it 
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(e.g.,  XQuery)  which  incorporate  XPath.  An  XPath  query  consists  of  a  location  path  and  an  output  expression.  The  location 
path  is  a  sequence  of  location  steps  that  specify  the  path  from  the  document  root  to  a  desired  element.  The  output 
expression  specifies  the  portions  or  functions  of  a  matching  element  that  form  the  results.  Each  location  step  has  an  axis,  a 
node  test,  and  an  optional  predicate. 

Ql://Country[@id="f0_162"]/Province/Text() 

<nondifll> 

■  country  id-"fO_;U"  na»e-"France"  caaita]=*"fQ_lJlD"> 

■^province  ids"fCL_17**7"  na»e="Aqui  trine"  eountry=HfO_21J"  eapitaU"fO_?426"> 

<city  id-"fO_2426~  nw^Sordwu*"  country "f0_21  J"  prowinc«^"f0_r4a7~> 

{population  year="90">2Hm6</population> 

*/city> 
</provitice> 

^province  idj="fQ_17507"  naae="ile  de  France"  tountry="fD_2U"  capital="f0_151u"> 
<city  id="f 0,1510"  nanw  -"Paris"  province-'-fO_17i07"> 

■  nopulanon  year^"95">21  S747  !■ /popular  ion. 
<W*ted_*t  typ*="riv«r"/»' 

</city> 

tcity  id="fu_3M>8"  nane^'Bouloqne  Bi  1  lancourt"  country="f OJH"  provincez"fO_l?S07-> 
speculation  year=n90"il01743*/popt]laTiion> 

</cIty> 

</eeun.try> 

^country  id="f0^220"  nawe="'Gerinany"  capital="fC_1515"» 
*n aae>Ge  rman y < /name  > 

<provinre  id*  "fO.175  Jl"  names  "Bay  errT  country="f0_2?0"  cai>ita1»"f0 

*city  id="f0.2712"  wtVMKhn    count ry='-f0_2 2(T  province="fOU7S il"> 

■  pepularion  year-"y>"'  ■'LJA~b-'b- /  popular  ion  -- 

</city> 

<eity  id."fO_2747"  nia^'  Nurnberq"  counti-y^fa,^""  provincc^'f 0_175il"> 
.population  ytar="qi"-49!ft4S.  /population. 

</city> 

^province  id*"f0_175 J3"  na«="Berl  in"  count ry-"f(L2 20"  capital="fO_J515"» 

<city  id-'f0_1514-  naaNi-'Berlin-  country-nf0_220"  province.-fO_17533"> 
-.population  year=  9i  .-34 "2009-  /population 

t/city> 
*/provinw 
^     ^_  </eountry> 

Figure  2:  An  Example  of  XML  Document 

For  example,  the  location  path  of  the  query  //country[@id="f0_162"]/province/text()  is 
//country[@id="f0_162"]/provinc.  The  output  expression,  text(),  indicates  that  only  the  text  content  of  the  matching  name 
appears  in  the  result.  In  the  first  location  step,  //country [@id="f0_  162"],  //  is  the  closure  axis  denoting  descendant-or-self, 
country  is  the  node  test,  and  @id="f0_162"  is  the  predicate.  The  predicate  restricts  the  results  to  the  province  sub-elements 
of  country  that  have  an  id  sub-element  whose  content  has  value  as  f0_162. 

•  Twig  Pattern  Query 

A  twig  pattern  query  consists  of  two  or  more  path  expressions,  thus,  it  involves  element  selections  satisfying 
complex  patterns  in  tree-structured  XML  data.  The  twig  pattern  query  is  a  core  operation  in  XML  query  processing  and 
popularly  used  as  it  can  represent  complex  search  conditions  (Al-Khalifa  et  al.  2002;  Tatarinov  et  al.  2002).  For  example 
twig  pattern  query  "/mondial/country[name/text()="Germany"]/province/city"  is  to  find  cities  located  in  the  provinces  of  a 
country  that  has  a  child  element  "name"  whose  text  content  is  "Germany". 

•  Structure  Indexing 

Many  techniques  (Jun  Pyo  Park  et  al.  2013;  Wang  et  al.  2005)  using  a  structure  index  have  been  proposed  for 
efficient  XML  query  processing.  The  structure  indexing  directly  captures  the  structural  information  of  XML  documents 
and  is  used  for  XML  query  processing.  Conventional  wireless  XML  streaming  methods  (Park  et  al.  2005;  Park  et  al.  2006) 
using  a  structure  index  exhibit  good  performance  for  simple  path  query  processing  benefitting  from  the  size  reduction  but 
they  do  not  support  twig  pattern  queries. 

WIRELESS  XML  STREAM 

In  this  section,  stream  organization  for  XML  data  is  explained.  This  section(A)  presents  the  unit  structure  of  the 
stream  called  G-node.  The  next  section  (B)  explains  attribute  summarization.  Section  (C)  presents  XML  stream  generation 
algorithm. 
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Encoding  (LE)  is  compared  with  other  XML  streaming  methods,  LE  eliminates  redundant  attribute  names,  and  the  size  of 
stream  generated  by  LE  is  further  reduced.  The  size  of  stream  generated  by  LE  is  the  smallest  even  though  LE  contains 
many  indices  including  location  paths,  Lineage  Codes,  CIs,  AIs,  and  TIs.  The  generation  costs  do  not  affect  the  access  time 
and  tuning  time  though  generation  time  for  LE  is  larger  than  PS  and  S-node  approaches  because  the  broadcast  server  only 
disseminates  the  pre-generated  XML  stream. 

Table  2:  Mondial  Data  Stream  Generated  by  Wireless  Streaming  Methods  (Park  et  al.  2010) 


TS 

EC 

SD 

DIX 

LE 

#  of  Tag  Names 

22,423 

22,423 

22,423 

22,423 

33 

#  of  Attribute  Names 

47,423 

47,423 

47,423 

47,423 

47,423 

#  of  Attribute  Values 

47,423 

47,423 

47,423 

47,423 

47,423 

#  of  texts 

7,647 

7,647 

7,647 

7,647 

7,647 

Size  of  the  Indexes 

2,693,219 

3,192,163 

358,768 

269,076 

40,631 

Generation  Time 

203  ms 

401  ms 

499  ms 

1,226  ms 

967  ms 

Figure  7  shows  access  time  evaluation  results  on  the  real  XML  data  set.  The  access  time  is  decided  by  two  factors: 
1)  the  size  of  data  stream,  and  2)  a  correct  prediction  ensuring  early  termination  of  query  processing.  As  shown  in  the  table, 
Lineage  Encoding  exhibits  the  best  performance  because  it  generates  the  smallest  data  stream  by  eliminating  redundant  tag 
names  and  attribute  names,  and  terminates  query  processing  quickly  compared  to  other  wireless  XML  stream  methods 
because  approaches  such  as  S-node  (Kaushik  et  al.  2004)  and  DIX  approaches  (Park  et  al.  2009)  explore  the  entire  stream 
to  find  desired  data  dispersed  over  the  stream  and  the  sizes  of  indices  are  significantly  larger  as  a  result  access  times  are 
significantly  larger  than  the  others  as  in  the  case  of  TS  and  EC. 
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Figure  7:  Access  Time  Evaluation  on  the  Mondial  Data  Set 

CONCLUSIONS 

In  this  paper,  in  addition  to  the  energy  and  latency  efficient  wireless  XML  streaming  scheme  that  supports  queries 
in  the  wireless  environment,  frequent  queries  are  determined.  The  contribution  of  counter  based  algorithm  plays  a  vital  role 
in  identifying  frequent  queries.  The  frequency  threshold  considered  did  not  affect  update  throughput  and  it  is  faster. 
The  space  used  by  the  algorithm  at  the  finest  accuracy  level  was  less  than  1MB  and  the  cost  was  100  times  less.  This  range 
of  sizes  is  small  enough  to  fit  within  a  second  level  cache. 

In  this  paper,  the  proposed  framework  process  the  twig  pattern  query  for  the  mobile  client  by  retrieving  the 
required  data  satisfying  the  given  predicates  by  performing  bit-wise  operations  on  the  Lineage  Codes  in  the  relevant 
G-nodes.  In  the  future,  we  plan  to  analyze  the  issues  that  were  not  addressed  in  this  paper.  First,  as  network  failures  may 
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occur  in  wireless  broadcasting  environment  as  communication  is  unstable,  the  indexing  mechanism  should  take  into 
account  network  failures.  Second,  issues  related  to  security  were  not  considered. 
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